Single-channel singing voice separation has been considered a difficult task, as it requires\npredicting two different audio sources independently from mixed vocal and instrument sounds\nrecorded by a single microphone. We propose a new singing voice separation approach based on the\ncurriculum learning framework, in which learning is started with only easy examples and then task\ndifficulty is gradually increased. In this study, we regard the data providing obviously dominant\ncharacteristics of a single source as an easy case and the other data as a difficult case. To quantify the\ndominance property between two sources, we define a dominance factor that determines a difficulty\nlevel according to relative intensity between vocal sound and instrument sound. If a given data is\ndetermined to provide obviously dominant characteristics of a single source according to the factor,\nit is regarded as an easy case; otherwise, it belongs to a difficult case. Early stages in the learning\nfocus on easy cases, thus allowing rapidly learning overall characteristics of each source. On the\nother hand, later stages handle difficult cases, allowing more careful and sophisticated learning.\nIn experiments conducted on three song datasets, the proposed approach demonstrated superior\nperformance compared to the conventional approaches.
Loading....